Weak error analysis of numerical methods for stochastic models of 

population processes 



The simplest, and most common, stochastic model for population processes, including those 
from biochemistry and cell biology, are continuous time Markov chains. Simulation of such 
models is often relatively straightforward as there are easily implementable methods for the 
generation of exact sample paths. However, when using ensemble averages to approximate 
expected values, the computational complexity can become prohibitive as the number of com- 
putations per path scales linearly with the number of jumps of the process. When such methods 
become computationally intractable, approximate methods, which introduce a bias, can become 
advantageous. In this paper, we provide a general framework for understanding the weak error, 
or bias, induced by different numerical approximation techniques in the current setting. The 
analysis takes into account both the natural scalings within a given system and the step-size 
of the numerical method. Examples are provided to demonstrate the main analytical results as 
well as the reduction in computational complexity achieved by the approximate methods. 

1 Introduction 

This paper provides a general framework for analyzing the weak error of numerical approximation 
techniques for the continuous time Markov chain models typically found in the study of population 
processes, including chemistry and cell biology. The main novelty of this work lies in how the 
analysis takes account of both the natural multiple scalings of a given system and the step-size of 
the numerical method, and is best viewed as an extension of the papers pi 1221 [23]. 

For k G {1, . . . , R}, let Cfc S I^*^ denote the possible transition directions for a continuous time 
Markov chain, and let : M'^ — )• M denote the respective intensity, or propensity functionsj^ The 
random time change representation for the model of interest is then 



'Department of Mathematics, University of Wisconsin, Madison, Wi. 53706, anderson@math.wisc.edu, grant 
support from NSF-DMS-1009275. 

^Department of Mathematics, University of Wisconsin, Madison, Wi. 53706, koyama@math.wisc.edu, grant sup- 
port from NSF-DMS-1009275 and NSF-DMS-0805793. 

°AMS 2000 subject classifications: Primary 60H35, 65C99; Secondary 92C40 

^In the language of probability, the functions are nearly universally termed intensity functions, whereas in the 
language of chemistry and cell biology these functions are nearly universally termed propensity functions. We choose 
the language of probability theory throughout the paper. 



David F. Anderson* and Masanori Koyama^ 



January 12, 2013 



Abstract 




(1.1) 



1 



where the are independent, unit-rate Poisson processes. See, for example, [27], [HI Chapter 6 ], 
or the recent survey [5] . The infinitesimal generator for the model (1.1) is the operator A satisfying 



iAf)ix) = ^X',ix){f{x + a)-fix)), 

k 

where / : M'^ — t- M is chosen from a sufficiently large class of functions. 



The problem of simulating (1.1) (and particularly of approximating expected values) seems 
deceivingly easy since we can simulate the continuous time Markov chains exactly]^ Letting / be 
some function of the state of the system giving us some quantity of interest, we may estimate 
Kf{X{T)) via an ensemble average. 



n 



1 " 

(1-2) 



where Xuj is the ith independent copy of (1.1). The law of large numbers then ensures that 



lim /i„=E/(X(r)), (1.3) 

with a probability of one. However, it is the computational work needed to achieve an accuracy 
with a given tolerance, and not simply the fact that such a limit holds, that is of most interest to 
us. 

1.1 Scalings and computational cost 



The main potential problem in trying to naively apply the limit (1.3) to a given system stems 



from the fact that there is an expected computational cost to the generation of each independent 



realization, which we denote by N for now, and explicitly quantify in (1.7) below. Assuming 
we wish to approximate Kf{X{T)) to an accuracy of e > 0, in terms of confidence intervals, we 
must generate 0(e~^) paths yielding a total computational complexity of order 0{Ne^'^). This 
computational complexity can be substantial when N is large and/or e is small. 

In many models of interest, including many from cell and population biology, we do, in fact, 
have that ^ 1. It is therefore natural to consider how approximation schemes perform. Before 



considering such schemes, however, it is important (from an analytical point of view) to modify ( 1.1 ) 
by incorporating into the model a scaling parameter, N, that can eventually be used to quantify 
N. The value N is usually taken to be the order of magnitude of maxj \Xi\. We then scale the 
process by setting 

j^N ^ N-'^'Xi, 

where ctj is chosen so that Xf^ is 0(1). Defining C,^ = N~°''-(^f^i, the general form of the scaled 
model is then 

R / rt 



X^it) = X^(0) + f^n (a^^ iV^^Afc(A^(s))ds) Cf , (1.4) 
where 7 and Ck are scalars such that 



|Cf| = 0(iV-^''), (1.5) 



^This assumes the pseudo-random numbers generated by modern computers are "random enough" to be considered 
truly random. We take this viewpoint throughout. 
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with \Ck\ ~ ^ for at least one k, and both X'^ and Afc(X^(-)) are 0(1). We exphcitly note 
that we are ahowing for the possibihty that \C^\ ^ N^'^'' for some of the k. Also, note that 



the models (1.1) and (1.4) are equivalent in that one is simply a scaled version of the other. For 
concreteness, the scaling thus described will be carried out explicitly for the stochastic models 
arising in biochemistry in Section [2} 



The infinitesimal generator A for the model (1.4) is 



{A^f){x) = N"'Y, N-'Xk{x){fix + Cf ) - fix)), (1.6) 

k 

where / : M'^ — )■ M is chosen from a sufficiently large class of functions. Note that it is now natural 
to take 

k 

as the order of magnitude for the number of steps required to generate a single path up to a time 
of T > 0. 



The parameter 7 of (1.4) should be thought of as representing the natural time-scale of the 
problem with 7 > implying a relevant time scale smaller than one. In this case of 7 > 0, the 
explicit numerical schemes considered in this paper are usually not a good choice and other methods, 
such as averaging techniques, are usually required in conjunction with the methods described here 
[SI 13 El [13 ES]. This fact is demonstrated by our main analytical results which provide error 
bounds for the different schemes that grow exponentially in . Therefore, our main results are 
most useful when 7 < 0. 



The model (1.4) is henceforth our main model of interest. We make the following running 
assumption, which in light of the fact that both and Xk{X^{-)) are 0(1), is a light one. 



Running Assumption: The intensity functions for the scaled process satisfying ((L4|, 
together with all of their derivatives, are uniformly bounded. 

The above running assumption can almost certainly be weakened to a local Lipschitz condition, 
in which case analytical methods similar to those found in [24J and/or [28j can be applied. Proving 
our main results in such generality, while possible and certainly worth doing in future work, will 
be significantly messier and we feel the main points of the analysis will be lost. 

We return to our problem of interest and let / be a function of the state giving a quantity of 
interest and consider how to approximate E/(X^(T)). As already discussed, the computational 
cost of approximating E/(X^(T)) to an accuracy of e, in the sense of confidence intervals, using 



the estimator (1.2) is 0{Ne ■^). Suppose now that is an approximation of X^ constructed 
with a time-discretization step of size /i > o|^ Letting denote independent copies of , we 

jE/(^Sm)- (1-8) 



construct the estimator 



n 

Suppose that it can be shown that the approximation scheme has a weak error, or bias, of order 
one. That is, 

E/(x^(r))-E/(z^(r)) = o(/i), 

^As the approximate process explicitly depends upon the choice of /i, we could denote it as . However, for ease 
of exposition we choose to drop the h dependence from the notation. 
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for a suitably large class of functions /. Then, noting that 



Ef{X^{T)) -fln= [E/(X^(r)) - Ef{Z^{T))] + [E/(Z^(r)) 



we see that we must choose h = 0{e) to make the first term on the right, the bias, 0(e), and 
n = 0{e~^) to make the second term, the statistical error, have a variance of O(e^), and a standard 
deviation of 0(e). This gives a total computational complexity of 0(e~^). This will greatly lower the 
computational complexity of the problem, as compared with using exact sample paths, if e^^ <^ N. 
If, instead, the method for generating is second order accurate in a weak sense, that is if 

Ef{X^{T)) - E/(Z^(T)) = 0(/i2), 

then we could choose h = 0(e^/^) to yield a bias of 0(e). This leads to a total computational 
complexity of 0(e~^'^), which for small e represents a substantial improvement over using an order 
one method. 

The above discussion points out that the key quantity to understand for a given approximation 
method, and the focus of this paper, is the bias, or weak error, it induces for a given function /: 

Bf{Z'',x,t) ^='E,./(X^(t))-E,./(Z^(t)). (1.9) 

Note that Bf{Z^ , x, h) represents the local, one-step error of the method as the fixed time-step is 
of size /i > 0. Analyzing the bias induced by different numerical schemes is by now classical in the 
study of stochastic processes, with nearly all the focus falling on how the bias scales with the size of 
the time-step, h [26J. However, it is not sufficient in the current setting to simply understand how 



the bias (1.9) scales with the time-discretization alone. Care must also be taken to quantify how 
the leading order constants depend upon the natural scalings of a given system and given method, 
here quantified by the parameter A'^ > 0. For example, if 

E/(A^(r)) - E/(Z^(r)) = 0(cf /i + c^/i2), 

then we wish to understand how c^, depend upon N since for a given choice of h we may have 
that c^h < c^K^. In this case, the method will behave as if it is an order two method until h is 
reduced to the point when c^h > c^/i^, in which case it will behave like an order one method. 

1.2 Notation and terminology 

In this short subsection, we collect some necessary extra notation and terminology used throughout 



the paper. We first note that for / : M — )• M, and any t>0, Dynkin's formula for the process (1.4) 
is ^ 

E,./(A^(t)) = fix) + E,. / A''fiXis))ds, (1.10) 

Jo 

which holds so long as the expectations exist. Similar expressions will hold for the approximate 
methods under consideration. Dynkin's formula will be our main analytical tool as it will allow us 



to quantify the bias (1.9) for the different methods and we therefore focus on developing compact 



notation for the generators of our processes. 

We define the operator for the kth. possible transition, which we will typically call a "reac- 
tion" in keeping with the motivating application of Section [2j via 

V^/(x) N^>^{f{x + C^)-f{x)). (1.11) 
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Note that if / is globally Lipschitz, then f{x) is uniformly bounded over k and x since 
0{N~'^''). We may now write (1.6) as 



(1.12) 



Defining the vector valued operators 

where we recall that R is the number of reactions, we obtain 

(^^/)(x) = (An'A.V^)/(x). 
For i £ {1, . . . , d} and k G {1, . . . , R}, we let nik satisfy 

|Cf| = iv-'"'=. 

Note that, by construction, we have Ck < mi^, for all k. Finally, we denote the jth directional 
derivative of / into the direction [vi,V2, ■■■Vj] by f'[vi, ■■■,Vj] and make the usual definition 



sup{/'[ui, ....,Vj]{x), \\v\\ = 1} 



(1.13) 



1.3 Summary of main results. 

The following list is a summary of our main results. Technical details and assumptions have been 
omitted from the statements below for the sake of clarity. 



1. In Theorem 4.1, we prove that for any explicit numerical scheme with a step-size of /i > 0, 

Bf{Z^,x,T) = 0{Th~^sup\Bf{Z^,z,h)\), 



where Bf{Z'^ ,x,t) is the bias defined in (1.9). Thus, if the numerical scheme has a local, one- 
step, error of 0{hP~^^), then the global error is 0{hP). This result is standard and should be 
compared to similar results in [6l [22] . It is included here since it is necessary to show that the 
scalings do not alter the usual result. 



2. In Theorem 



5.5 



we prove that if is generated via Euler's method, also known as explicit 



r-leaping in the setting of biochemistry, then 

Bf{Z^,x,h) = 0{ch'^) 



where c is independent of A^. Thus, after applying Theorem 4.1 , Euler's method is proven to be 
an order one method in that the leading order of the global error satisfies 



Bf{Z^,x,T) = 0{ch), 



and decreases linearly with the step-size. This fact is formally stated in Theorem 6.4 
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3. In Theorem 5.6, we prove that if is generated via an approximate midpoint method, then 



Bf{Z^,x,h) = 0{c^h^ + c^h^), 

where c{^,c^ depend upon the natural scaHngs of the system, quantified here by > 0. The 
term that dominates this error then depends upon the specific scalings of a system, encapsulated 



in the constants Ci and C2 , and the size of the time discretization h. Theorem 4.1 then implies 

Bf{Z^„x,T) = 0{c^h + c^h''), 
and the midpoint method will sometimes behave like a first order method, and other times will 



behave like a second order method. This fact is formally stated in Theorem 6.5 A transition 
point, in terms of and h, for this change in behavior is also provided. 



4. In Theorem 



5.8 



we prove that if Z^^^^ is generated via the weak trapezoidal method, which was 
originally formulated in the diffusive setting \Ql and is extended to the discrete setting in Section 
U then 

Bf{Z^,,^,x,h) = 0{ch^), 



where c is independent of N. Thus, after applying Theorem |4.1[ the Weak Trapezoidal method 
is proven to be a second order method in that the leading order of the global error satisfies 



Bf{Z,%,x,T) = 0{ch'), 



and decreases quadratically with the step-size. This fact is formally stated in Theorem 6.7 



1.4 Context 

We attempt to put the present work in the correct historical context. In [3], Anderson, Ganguly, 
and Kurtz provided the first error analysis of different approximation techniques that incorpo- 



rated the natural scalings of the system (1.1) into the analysis. Specifically, they considered 
models satisfying the "classical scaling," which using our present terminology corresponds with 
7 = 0, Cfc = 1, and Oi = 1. They further coupled the time discretization to the scaling, thereby 
ensuring h was always in a useful regime, and derived results for both the weak and strong error 
of Euler's method and the midpoint method. They proved that, in this specific setting, Euler's 
method is an order one method in both a weak and a strong (in the norm) sense. They 
proved that the strong error of the midpoint method falls between order one and two (see [3] for 
precise statements) , and that the leading order term of the weak error of the midpoint method 
scales quadratically with the step-size. 

In [22] it was shown by Hu, Li, and Min that the OQi^) weak convergence rate of the midpoint 
method given in depended intimately on the coupling between the time discretization and 
the scaling parameter of the system. It was this particular observation that in a large part 
motivates the present work as we wish to provide a general analytical rate of convergence for 
the different relevant methods in the most general possible scaling regime and in which the time 
discretization parameter is independent of the natural scalings. 

In [6] the weak trapezoidal algorithm was introduced in the context of SDEs driven by Brownian 
motions. There, it was shown to be an easy to implement method that is second order accurate 
in a weak sense. Further, it has the nice property that no costly derivatives need be computed 
during the course of the simulation. In ^23j , it was pointed out that since the motivation for the 
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original weak trapezoidal algorithm comes from viewing SDEs as driven by space-time Wiener 
processes, the exact same algorithm can work in the present jump setting by viewing the driving 
forces as space-time Poisson processes. This observation was also made independently in an 
earlier version of the present paper. 

It is worth pointing out that there are at least two other trapezoidal type algorithms in the 
literature pertaining to models of stochastic chemical kinetics. These are the implicit and explicit 
trapezoidal methods of [11] . These method were explicitly developed to give better stability than 
the usual methods, and so do not exhibit better convergence than does Euler's method. 



1.5 Paper outline 

The remainder of the paper is organized as follows. In Section [2| we show how the basic models 
considered in this paper, both (1.1) and (1.4), arise naturally in biochemistry, which is the main 



area of motivation for this work. This section can safely be skipped by anyone not interested in 
that application. In Section [3j we discuss numerical methods for the models under consideration, 
including both exact and approximate schemes. In Section [41 we prove Theorem 4.1, as stated 



loosely above, and relevant corollaries. In Section [5} we prove Theorems 5.5 5.6, and 5.8 each 
stated loosely above, providing the local, one-step errors induced by the approximate schemes 
considered here. In Section [6j we provide bounds on the semigroup operator of the exact process 
, yielding the final piece to the global analysis of the weak error of the different methods. We 
also briefly discuss stability concerns in Section [6| In Section [TJ we provide relevant examples. 



2 Motivating Systems: Biochemical Reaction Networks 



This section builds the relevent models (1.1) and (1.4) used in the study of stochastically modeled 
biochemical reaction networks. We feel it is worthwhile to include this section as this is the area 
of main motivation for the present work. However, it can safely be skipped by those wishing to 
simply see the mathematical analysis and not the areas of application. 



2.1 The unsealed model 

A chemical reaction network is a dynamical system involving multiple reactions and chemical 
species. The simplest stochastic models of such networks treat the system as a continuous time 
Markov chain with the state, X G ^>0) giving the number of molecules of each species and with 
reactions modeled as possible transitions of the chain. 
An example of a chemical reaction is 

2S1 + S2 ^ 53, 

where we would interpret the above as saying two molecules of type Si combine with a molecule 
of type 52 to produce a molecule of type 53. The Si are called chemical species. Letting 





















, and (i = u[- vi 














we see that every instance of the reaction changes the state of the system by addition of C,i. Here 
the subscript "1" is used to denote the first (and in this case only) reaction of the system. 
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In the general setting we denote the number of species by d, and for z S {1, . . . , d} we denote 
the ith species as Si. We then consider a finite set of R reactions, where the model for the kth. 
reaction is determined by 

(i) a vector of inputs Uf. specifying the number of molecules of each chemical species that are 
consumed in the reaction, 

(ii) a vector of outputs v'f^ specifying the number of molecules of each chemical species that are 
created in the reaction, and 

(iii) a function of the state A'^ that gives the transition intensity, rate, or propensity at which the 
reaction occurs. 

Specifically, if we denote the state of the system at time t by X{t) € Z*^, and if the kth reaction 
occurs at time t, we update the state by addition of the reaction vector 

dci I 



and the new state becomes X{t) = X{t—) + ^fc. For the standard Markov chain model, the number 
of times that the A;th reaction occurs by time t can be represented by the counting process 



Rk{t)=Yk(^l\', 



{X{s))ds 



where the Yk are independent, unit-rate Poisson processes [27], [TU Chapter 6 ]. The state of the 
system then satisfies 

X{t) = X{0) + J2 Yk (^1^ X'k{X{s))ds^ Cfc, 



which was (1.1) in the Introduction. The above formulation is termed a random time change 
representation and is equivalent to the chemical master equation representation found in much of 
the biology and chemistry literature, where the master equation is Kolmogorov's forward equation 
in the terminology of probability. 

A common choice of intensity function for chemical reaction systems is that of mass action 
kinetics. Under mass action kinetics, the intensity function for the fcth reaction is 

d 



7 • 



(2.1) 



where i^ki is the ith component of Uk- 

Example 1. To solidify notation, we consider the network 

'''1 K, 

where we have placed the rate constants above or below their respective reactions. For this 
example, equation (1.1) is 



1 




X{t) = X{0) + Yi (^J^ KiXi{s)ds^ 

+ Y3 (^j\sX2{s){X2{s) - l)ds 



K2X2{s)ds 
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Defining (^i = [—1, 1, 0]'^, C2 = [1) —1) 0]'^, and Ca = [0, —2, 1]-^, the generator A satisfies 

{Af){x) = KiXiifix + Cl) - fix)) + K2X2{f{x + C2) " f (x)) + K3X2{x2 " + Cs) " f{x)). 



2.2 Scaled biochemical models 

The scaling described below has been used previously in at least [H 13 El [25] . We emphasize that the 
scaling is an analytical tool used to understand the behavior of the different processes, and that the 
actual simulations using the different methods make no use of, nor have need for, an understanding 
of N, a, or the (3^- 

Let ^> 1 be a natural parameter of the system, perhaps the abundance of the species with 



the highest number of molecules. Assume that the system satisfies (1.1) with A', determined via 



mass-action kinetics (2.1), and £ ^ representing the reaction vectors described in Section 



2.1 



For each species i, define the normalized abundance (or simply, the abundance) by 

Xf (t) = iv-«x,(t), 

where ai > should be selected so that is 0(1). Here X^^ may be the species number (oj = 0) 
or the species concentration, or something else. 

Since the rate constants may also vary over several orders of magnitude, we write = k^N^^ 
where the (3k are selected so that = 0(1) (recall that is the original system parameter). 
Note that while the ai are non-negative if is chosen to be the abundance of the species with the 
highest number of molecules, Pk can be positive, negative, or zero. 

Under the mass-action kinetics assumption, we always have that A^(X(s)) = -/V^'="'"'^'='"Afc(X^(s)), 
where A^ is deterministic mass-action kinetics with rate constants Kfc [3 [71 [23]. Our model has 
therefore become 

X^(t) = X^(0) + Y,yk NP'^+''^-'^\k{X''{s))ds^ , i G {1, . . . , 4, (2.2) 

where = N~°'^(^ki- To quantify the natural time-scale of the system, define 7 € M via 

7= max {j3k + Vk ■ a - di] , 

where we recall that i/^ is the source vector for the kth reaction. Letting 

Cfc = + t'fc • a - 7) 



for each k, the model (2.2) is seen to be exactly (1.4) 



Remark 2.1. If /J^ + • a = = 1 for all i, k in (2.2), in which case 7 = 0, then we have what is 
typically called the classical scaling. It was specifically this scaling that was used in the analyses of 
the Euler and midpoint methods found in [31 I23j . In this case it is natural to consider X^ as a 
vector whose ith component gives the concentration, in moles per unit volume, of the ith species. 

Example 2. ^45 an instructive example, consider the system 

100 
Si «^ 52 
100 
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with Xi{0) = X2{0) = 10,000. In this case, it is natural to take N = 10,000 and ai = 02 = 1- 
the rate constants are 100 = 10,000, we take /3i = /32 = 1/2 and find that 7 = 1/2. The equation 
governing the normalized process IS 

X^{t) = (0) - Y, [n^^N 1 + (iV^/^iV - X^{s))ds^ 1 

where we have used that Xf + X2 = 2. 



3 Numerical methods 
3.1 Exact methods. 

As already discussed in the introduction, because we are considering continuous time Markov chains, 
there are a number of numerical methods available for the generation of exact sample paths for 



the model (1.1), or the equivalent model (1.4). All are examples of discrete event simulation 
In the language of biochemistry these methods include the stochastic simulation algorithm, best 
known as Gillespie's algorithm in this setting I18|. the first reaction method [17j, and the next 
reaction method [U [16] . All such algorithms perform the same two basic steps multiple times until 
a sample path is produced over a desired time interval: conditioned on the current state of the 
system, both (i) the amount of time that passes until the next reaction takes place. At, is computed 
and {a) the specific reaction that has taken place is found. Note that At is an exponential random 
variable with a parameter of Xk{X{t)). Therefore, if 

j;A,.(X(t))«]V»l so that EAt = ^^-i^— « 4 « 1' (^.l) 

then the runtime needed to produce a single exact sample path may be prohibitive when coupled 
with Monte Carlo techniques, and approximate methods may be desirable. 

3.2 Approximate methods. 

There will be times we will wish to discuss an arbitrary approximation to X or X^ , and other times 
we will wish to consider specific approximations. When we consider an arbitrary approximation 
we will simply denote the approximation as Z or . When we distinguish the Euler, midpoint, 
and Weak Trapezoidal approximations, the main approximations under consideration here, we will 
denote by Ze^ Zm, and Ztrap the respective approximations to X, and by Z^, Z^, and Z^^^ the 
respective approximations to X^ . Throughout, our time-discretization parameter will be denoted 
by /i > OQ 

3.2.1 Euler's method 



The Euler approximation, Ze^ to the model (1.1) is the solution to 

rt 



ZE{t) = Ze{^) + E (/ ^'k^^E ° ^(^))^«) Ck, (3.2) 



■^Historically, the time discretization parameter for the methods described in this paper have been r, thus giving 
these methods the general name "r-leaping methods." We choose to break from this tradition and denote our 
time-step by h so as not to confuse t with a stopping time. 
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where r/(s) ^ ^ ^' other notation is as before. Note that ZE{r]{s)) = ZE{tn) if < s < 

tn+i- The basic algorithm for the simulation of (3.2) up to a time of T > is the following. For 
a; > we will write Poisson(3;) for a Poisson random variable with a parameter of x. 

Algorithm 1 (Euler's method). Fix /i > 0. Set Ze{0) = xq, to = 0, n = and repeat the following 
until tn+i = T: 

(i) Set tn+i = tn + h. If > r, set tn+i = T and h = T — tn- 

(ii) For k G {!,...,/?}, let = Poisson(A'^(Z£;(t„))/i) be independent of each other and all 
previous random variables. 

(iii) Set ZE{tn+i) = Zs(t„) + J2k ^kCk- 

(iv) Set n n + 1. 

The above algorithm is termed explicit tau-leaping in the biology and biochemistry literature 
|19) . Several improvements and modifications have been made to the basic algorithm described 
above over the years in the context of biochemical processes. Many of the improvements are 
concerned with how to choose the step-size adaptively |10t[20] and/or how to ensure that population 
values do not go negative during the course of a simulation [21 [HI [12] , which is a relevant issue as 
population processes have a natural non- negativity constraint. For the simulations carried out in 
Section [7j we choose to simply keep a fixed step-size and set any species that goes negative in the 
course of a jump to zero. 

Defining the operator 



(B./)(x) ^=^^AU^)(/(x + a)-/(x)), 



(3.3) 



we see that for t > 



EfiZEit)) = EfiZE o r]{t)) + E / {Bz^,r,it)f){ZEis))ds 



n{t) 



(3.4) 



so long as the expectations exist. The scaled version of (3.2), which is an approximation to X 
satisfying (1.4), is 



N 



(t) = Z^ (0) + Y,yk [n^ N^'\k{Z^ 



(3.5) 



where all notation is as before. Define the operator by 



B^f{x) = {N^X{z)-V")f{x). 



If Zj^ satisfies (|3^, then for alH > 

E/(Zf(t))=E/(Zf(,?(t))) + E 
so long as the expectations exist. 



rj{t) Eyi\ " 



(3.6) 
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3.2.2 Approximate midpoint method 

A midpoint type method was first described in [loj^ and analyzed in [3] and [22j . Define the 
function ^ 

p{z) = 2+ -/l^Afc(^)Cfc, 



which computes an approximate midpoint for the system (1.1) assuming the state of the system is 
z and the time-step is h. Then define Zm to be the process that satisfies 

ft 



ZmH) = Za/(0) + Yk (^j^ X'k o p{Zm o r]{s))ds^ Ck- 



(3.7) 



The basic algorithm for the simulation of (3.7) up to a time of T > is the following. Note 
that only step (ii) changes from Euler's method. 

Algorithm 2 (Midpoint method). Fix h > 0. Set Zjv/(0) = xq, Iq = 0, n = and repeat the 
following until tn+i = T: 

(i) Set tn+i = tn + h. If tn+i > T, set tn+i = T and h = T — tn- 

(ii) For k £ {1, . . . , R}, let = Poisson(A'^ o p{ZM{tn))h) be independent of each other and all 
previous random variables. 

{in) Set ZM{tn+l) = ZM{tn) + Y^k^kCk- 

(iv) Set n n + 1. 



For Bz defined via (3.3), any t > 0, and Zm satisfying (3.7), we have 



Ef{ZM{t)) = EfiZM o v{t)) + E / {BpoZ,,o^it)f ){ZM{s))ds, 

Jriit) 



satisfying (1.4), is 



so long as the expectations exist. The scaled version of (3.7), which is an approximation to A^ 

(3.8) 



where now 



Z^j{t) = Z^M + Y,Yk (a^ N'^Xk o p{Z^j o ,^{s))ds^ 
p[z) = z + hN^Y.N^>'X,{z)Ci^. 



N 
k J 



While we should write p in the above, we repress the "A" in this case for ease of notation. For 
defined via (3.6), and Z^j satisfying (3.8), we have 

E/(Z^,(t))=E/(Z^,(r?(t)))+E f {B^ f){zZ{s))ds, 

for all t > 0, so long as the expectations exist. 

^The midpoint metiiod detailed in ^19j is actually a slight variant of the method described here. In [19] the 
approximate midpoint, called p{z) above, is rounded to the nearest integer value. 
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3.2.3 The weak trapezoidal method. 



We will now describe a trapezoidal type algorithm to approximate the solutions of (1.1) and/or 
( 1.4 ). The method was originally introduced in the work of Anderson and Mattingly in the diffusive 
setting, where it is best understood by using a path-wise representation that incorporates space- 
time white noise processes, see 0. It has independently been extended to the current jump setting 
in [23] where it was studied in the classical scaling (7 = 0, = 1, = 1, with the step size coupled 
to the system size similarly to the analysis in [3]). 

In the algorithm below, which simulates a path up to a time T > 0, it is notationally convenient 
to define [x]"*" = a; V = max{x, 0}. 

Algorithm 3 (Weak trapezoidal method). Fix h > 0. Set Z{0) = xq, to = 0, and n = 0. Fixing a 
e G (0,1), we define 

t def 1 1 dcf 1 (1 - 0)^ + „v 

= 2W^) = 2 e{i-e) ■ ^^-^^ 

We repeat the following steps until = T, in which we first compute a 0-midpoint y* , and then 
the new value Ztrap{tn+i)'- 

(i) Set tn+i = tn + h. If > r, set tn+i = T and h = T — tn- 

(a) For k £ {1, . . . , R}, let A^ i = Foisson{X'f^{Ztrap{'tn))Oh) be independent of each other and all 
previous random variables. 

{iii) Set y* = Ztrap{tn) + J2k ^k,lCk- 

(iv) For k e {1,.. . , R}, let Afc^2 = Poisson([,^iA'^(y*) - C2>^'k{tn)]~^ {'i- - 0)h) be independent of each 
other and all previous random variables. 

{v) Set Ztrap{tn+l) = V* + Y.k ^k,2Ck- 

(vi) Set n n + 1. 

Remark 3.1. Notice that on the (n + l)st-step, y* is the Euler approximation to X{nh + 9h) 
starting from Ztrapitn) at time nh. 

Remark 3.2. Notice that for all 6* € (0, 1) one has .^1 > ^2 and ^1 — ^2 = 1- 
We define the operator ^^1,22 by 

(i3.„.,/)(x) J^KiAU^i) - 6AU^2)]+(/(x + a) - f{x)). 

k 

Then, for r]{t) < t < r]{t) + 9h, the process Ztrap satisfies 

Ef{Ztrap{t)) = EfiZtrapivm + ^ / {Bz,^^^i^it))f){Ztrap(.s))ds, 

where we recall that Bz is defined via (3.3), and for r]{t) + 6h < t < 7]{t) + h, the process Ztrap 
satisfies 

Ef{Ztrap{t)) = Ef{Ztrap{v{t) + Oh)) +E [ (%™,C,(t)+e/.),Z,™,(,(t)) /) (^trap(5))^is. 

Jri{t)+eh 
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Finally, define the operator B^^ by 



where for some 6 G (0,1), and ^2 satisfy (3.9), and for u G M'^ the zth component of is 
[vi]'^ = max{t;j,0}. Then, if Z^.^p represents the approximation to (1.4) via the weak trapezoidal 
method, for ry(t) < t < r/(t) + Oh 



Ef{Z^.,^{t))=Ef{Z[^.,^{rjm+E f {B 



whereas for rj{t) + Oh <t < i]{t) + h 



^fiZilapit)) = + Oh)) + E / (^^V (,(t))/)(^/rap(^))^«- 

4 Global Error from Local Error 

Throughout the section, we will denote the vector valued process whose zth component satisfies 



(1.4) by ^ and denote an arbitrary approximate process via Z^^ . Also, we define the following 



N 



semigroup operators acting on / G Co(M'^,M): 

Vtfix) E,/(X^(t)) 
Ptfix) E,/(Z^(t)), 



(4.1) 



where for ease of notation we choose not to incorporate the notation N into either Vt or Pt. 
Returning to the notation introduced in Section [T| we note that 



Bf{Z'\x,t) = E,f{X'^t)) - E,J{Z'\t)) = {Vt-Pt)f{x). 

We may therefore interpret the difference between the above two operators, for t G [0,T], as the 
weak error, or bias, of the approximate process Z^ on the interval [0, T]. As /i > is our time-step, 
we note that Bf{Z^ , x, h) = [Vh — Ph)f{x) is the one step local error. 

Definition 1. Let n be an arbitrary non- negative integer, and be a m dimensional vector of 
C(M'^,M) valued operators on C(M'^,M), with its ith coordinate denoted by Ade- Then we define 



^ = sup 



\i=l 



, 1 < < ""T., p < n 



For example, if j, k, i £ {1, R} then 



Kvfvfvf /)(x)| < ii/iir, 



where we recall that is defined in (1.12). Note that, for any A^, 

\\J 

Also note that, by definition, for n > 



M 




M < II fiiM 



n+1- 



(4.2) 
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Definition 2. Suppose M : C(]R^,M) C(M'^,M^) and Q : C(M'^,M) C(M'^,M) are operators. 
Then define 



||C^||j^£ - _ sup ■ 



|A4 _ 

Note that as stated in the Introduction, the purpose of this paper is to derive bounds for the 



global weak error of the different approximate processes, which, due to (|4.2|), consists of deriving 



bounds for ||(P" — Vnh)\\m-^o, appropriately defined ^A and a reasonable choice of m > 0. 



Theorem 4.1 below quantifies how the global error ||(P^ — Vnh)\\rn-^o '^^^ ^e bounded using the 



one-step local error — 'PhWm-^o- Section [5| we will derive the requisite bounds for the local 
error for each of the three methods. 

Theorem 4.1. Let M. he a C(M'^,M^) valued operator on C(M'^,M). Then for any n,m> 0, and 
h>0 

£&{!,.. .,n} 



Proof. Let / G Co(M'^,M). Note that, since \\g\\o = \\g\\^ for any g, 

||pi-l||A^ IIP T) \\M — llpJ^^II IIP T> W-M 

With this in mind 

n 

upf: - Vnh)f Wo = II E(^'^Mn-j-) - ^r'^M-j-+i))/llo 

n 



X || ^iiX 
m— >m I 



Since Ph is a contraction, i.e. ||P/j||o-s.o ^ !> the result is shown. □ 
The following result, where replaces A4 in Theorem 4.1 is now immediate. 



hd 



Corollary 4.2. Under the same assumptions of Theorem 4-1 o,nd with f G 

ll(^h"-^n.)/|ir = 0(^11^/. -^/.Il^'o, max UWiHfWr}). 

£e{l,...,n} 

The following generalization, which allows for variable step sizes, is straightforward. 
Corollary 4.3. For f G C^{R'^,R) 

||E,/(ZtJ-E,/(XtJ|U = 0(n.max - P/.J|^" ol , max A\\VtJ\\T}). 

1=1, ■■■,n £e{l,...,n} 

Thus, once we compute the local one step error ||P/t — VhWm^o approximate process, we 

have a bound on the global weak error that depends only on the semigroup Vt of the original process. 
We will delay discussion of for now, as this term is independent of the approximate process. 

Instead, in the next section we provide bounds for — Vh\\m->o each of the three methods 
described in Section [Sl 
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5 Local errors 



Section 5.1 will present some necessary propositions and lemmas. Sections 5.2 5.3, and 5.4 will 



present the local analyses of the Euler, midpoint, and weak trapezoidal methods, respectively. 
5.1 Analytical tools 

Proposition 5.1. Let f e C^{R'^,R^). For any k £ {1, . . . , R} 

Vf/GO(iV^^-™^-||/||i)cO(l). 
In particular, N^'^'^V'^ f is bounded. 

Proof. The result follows from the fact that for any w ^M.'^ 

\f{x + w)-f{x)\ < \w\ 

Define, for any multi-subset / of {1, i?}. 



□ 



1=1 



so that, 



sup ||Vf /Hoc. 
\I\<n 



Proposition 5.2. Let f G C^(M'^,]R^). Then, 

\r = o{\\m- 



Proof. The case j = 1 follows from Proposition 5.1 Now consider V| /(x) for a multi-set I of 



{1, . . . , R}, with |/| = j > 2. If ruk > for all k £ I, the statement is clear. If on the other hand, 
rrik = for some k G I, then for this specific k, we have < and 

||vf /lu < 2iv^MlvVl|oo = OdI/lb-i) = 0(11/11,), 

where the second to last equality follows by an inductive hypothesis. □ 
We make some definitions associated with V^. Let g : M'^ — t- M^. For i,j £ {1, . . . , R} 

[(y^)%^ Vf Vf (5.1) 

diag{N^) = diag{N^\...,N^^). 
Also, we define to be the R dimensional vector whose entries are all 1. 
Lemma 5.3. (Product Rule) Let g,q -.W^ ^ be vector valued functions. Then 
Vf (5 • q){x) = {V^g . q){x) + {g ■ V^q){x) + N'^^iV^g • Vf g)(x). 

Also, 

V^(g • q)ix) = [D^'gfqix) + [D^'qfgix) + diagiNT\[D'' gf x [Z)^g]^)(rE)l^/. 
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Proof. Note that, for any k, 

V^{9 ■ q)ix) =N^-{g{x + C^)q{x + Cf ) - g{x)q{x)) 

= N^^{g{x + Cf ) - 9{x))q{x) + N^''{q{x + Cf ) - q{x))g{x) 
+ N-^'^N'^" {q{x + Cf ) - (z(x))iV-'= {g{x + Cf ) - ^(x)) 
= (V^g) ■ q){x) + {V^q ■ g){x) + N'^^iV^g • V^g)(x), 

verifying the first statement. To verify the second, one simply notes that the above calculation 
holds for every coordinate, and the result follows after simple bookkeeping. □ 

Corollary 5.4. Let A : M'^ — > M"^ he a vector valued function, and / : M'^ — > M. Then 
(A • V^/)(x) = (Vf A • V^)/ + A • V^Vf / + iV-^^Vf A • V^V^/. 

Also, 

V^(A • V^/) = [D^A]^V^/ + [(V^)V]A + dtag{N')-\[D''\ x (V^)^]!^/. (5.2) 
Proof. Simply put g = X and q = V^/, and note that is symmetric. □ 

5.2 Euler's method 



Throughout subsection 5.2, we let be the Euler approximation to X'^, and let 

PE,hfix) E,/(Zf(/i)), 
where h is the step-size taken in the algorithm. Below, we will assume h < N^"', which is a natural 



stability condition, and is discussed further in Section |6.2 
Theorem 5.5. Suppose that the step size h satisfies h < N^'^ . Then 

\\PE,h-n\\^Zo = o{N^''h^). 

Proof. For Euler's method with initial condition xq, 

PE,kf{xo) = fixo) + hB^Jixo) + y + 0{N'^\\f\\rh% (5.3) 



where, noting V A(xo) = and using the product rule in Lemma 5.3, we have 



</ = iV^A(xo)-V^/ 
«)2/ = iV^A(:Eo) • V^(iV^A(xo) • V^/) 

= N^U{xof[{V''ff]X{xo). (5.4) 



On the other hand, for the exact process (|1.4| 



Vnfixo) = fixo) + hA^'fixo) + y (-4^)V(xo) + OiN'''\\f\\rh'), (5.5) 

where, again, 

A^f = N^X ■ V^/. 
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Noting that, 

(A'^ffix) = N'^X • V^(A • V^/(x))) 

= iV2^A^([Z)^A]^V^/(x) + [(V^)2/]A(x) + N^W^idiag{N-^)[D''X x (V)^]!^/) 

(5.6) 



and defining 



a{x) = iV2^A^[Z)^A]^V^/(x) 

b{x) = iV2^A^[(V^)2/]A(x) 

c(x) N^U'^[diag{N-')[D''Xx{V''f]lnf{x)], 



we can write 



Vhfixo) = /(xo) + /i^^/(xo) + -(a(xo) + 6(xo) + c(xo)) + 0(iV37||_^||V^;,3)^ 



Note that BZfixo) = A^f{xo) and 6(xo) = {Bgrfixo). We may then compare (5.3) and (5.5) 



(«)2/(xo) - (a(xo) + 6(xo) + c(xo))) + 0(iV3^||/||r/^') 



- Vh)f{xo) - 2 vv-xo 

= y (-aM - c(xo)) + 0(iV3^||/|ir/^')- 
The term a(x) +c(x) = ©(iV^^H/H^ ) is clearly non-zero in general, giving the desired result. □ 
5.3 Approximate midpoint method 



Throughout subsection 



5.3 



we let Z^j be the midpoint method approximation to , and let 

PM,hfix) = E^fiZ^iih)), 
where h is the step-size taken in the algorithm. As before, we will assume h < N^"', which is a 



natural stability condition, and is discussed further in Section 6.2 



Theorem 5.6. Suppose that the step size h satisfies h < N . Then 



Remark 5.7. Theorem |5.6| predicts that the midpoint method behaves locally like a third order 
method and globally like a second order method if /i is in a regime satisfying N'^h ^ ]V~™''^^'"'=^, 
or equivalently if /i ^ ]S[-i-"^^'^{'m-k} _ This agrees with the result found in [3| pertaining to the 
midpoint method, which had 7 = 0, rrik = 1, and the running assumption that h S> This 
behavior is demonstrated via numerical example in Section [7} 



Proof, (of Theorem 5.6) Let C,^ denote the matrix with kth column , i.e. 
Recall that p is defined via 



h 



+ i,N^^Xk{z)N'^'^C, 



N 
k ■ 
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After some algebra, we have 

k 

= NUixo) ■ V^/(x) + wixo) + OiN'^WnY^'h^). 

where 

nj{x) N^^^[DX{xo)][('']diag{N')X{xo)-V^f{x). 



Next, using the product rule (5.2), we see 



K^^'fi^) = ^"^(^0 + ^[C'']diag{N-)X{xo)) ■ V^(iV^A(:Eo + ^[C'']diag{N-)X{xo)) ■ V^/)(x) 

= N^Uixo + ^[C^]dia5(iV^)A(xo))^[(V^)V]A(xo + ^[C'']diag{N')X{xo)) ■ V^/)(x) 
= g{xo) + 0{N^''\\f\\rh), 

where 

g{xo) = N^U{xof[{V''ff{x)]X{xo). 
Therefore, since N^X{xo) ■ V^/(xo) = A^fixo), it follows that 

PM,hf{xo) = f{xo) + hB^^^^^fixo) + y (^p';.o))V(xo) + 0{N^^f\\r h') 
= f{xo) + h f^^/(xo) + w{xo) + 0{N^^\\f\\rh') 



Recall that 



where 



(^^)2/(x) = a(x)+6(x) + c(x), 

a(x) = iV2^A^[Z)^A]'^V^/(x), 
6(x) = iV2^A^[(V^)2/]A(x), 

c{x) = N^U'^[diag{N-')[D''X x (V^)2]l^/(x)], (5.7) 



and 



Vhfixo) = f{xo) + hA^fixo) + y (a(xo) + b{xo) + c(xo)) + 0(iV3^||/||r^')- 
Noting that b{xo) = g{xo), we see 

{PM,h - Vh)f{xo) = hw{xo) + y {g{xo) - (a(xo) + b{xo) + c(xo))) + 0{N^^\\f\\rh^) 
= {hw{xo) - y a(xo)) - y c(xo) + 0{N^^f\\^h'). 
We will now gain control over the terms {hw{xo) — ^a(xo)) and ^c(xo), separately. 
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(5.^ 



Handling ^c(xo) first, we have that V^A^ G 0{N''''-'^^), and so 

c(xo) = 0(iV2^-'"'^{'»^^||/||r). 



Next, we will show that 



We have 



hw{xo) - ya(xo) = 0(iV2,-minW}||/||V-;^2) 



hw{x,) - ^a(xo) = ^N^'^[D\{xoW\diag{N-)\{xo) ■ V^/(xo) - y iV2^A^[Z?^A]^V^/(x) 
= ^iV2T(^[m(xo)][C^]dm5(iV'=) - [D^A(xo)])a(xo) • V^/(xo). 

(5.9) 



By Proposition 5.2, V^f{x) is bounded by • Therefore, we just need to show that the 

difference between the two square matrices 

[D^X{xo)] and [DX{xo)][C^]diag{N^) (5.10) 

is ©(iY-i^Wi^fe}). Recalling the definitions in ( [slj ), the (i,i)th entry of the left side of ( |5.10| ) is 

iV^^(A,(xo + Cf)-Ai(xo)) 



whereas that of the right side of (5.10) is 



Also, note that, for A G C^{R'^,R), 



{{X{x + v)- X{x)) - VA(x) • v) G 0{\v\'\\X\\2). 

where 

||A||2 = sup{||A||oo, ||9^,A||oo, \\dxjdx^X\\oo,i,j, k < d.} 

Since ||Afc||2 is bounded for any k, the difference between the (i, j)th entries of the two expressions 
in dslUl ) is 

Also, recall that cj — ruj < 0. Thus the above is also 



Therefore (5.9) is of order 



as desired. Combining the above with ( |5.8| ) gives us 

Win - PM,h)fh = 0(A^27-min{™,}||J||V-;,2 ^ j^2^-min{m,} y^^^^ ^2 ^ ||^|| V-;^3) 

= 0(||/||^'^[iV37/^3 ^ ^27-min{m,}^2])^ 

implying 

\\PM,h - VhW^^lo = 0{N^^h^ + iv27— W™.};,2)^ 

as desired. □ 
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5.4 Weak trapezoidal method 



Throughout subsection 5.4, we let Zj/^^p be the weak trapezoidal approximation to , and let 

Ptrap,hfix) = E,f{Zi^.,p{h)), 
where h is the size of the time discretization. We will again only consider the case h < N~'^, which 



is a natural stability condition and is discussed further in Section 6.2 



We make the standing assumption that for all x in our state space of interest, and k,j E 
{1, . . . , R}, we have 



6Afc(x + Cf)-6Afc(x) >0, 



(5.12) 



where > ^2 are defined in (3.9) for some 6 G (0, 1). Noting that will often be small, and that 
^1 — ^2 = 1; for most processes, including those arising from biochemistry, the requirement (5.12) 
holds so long as the process is not directly at the boundary of the positive orthant. Weakening 
(5.12) is almost certainly doable, for example by gaining control over the probability that a process 
leaves a region in which the condition holds. This is an avenue for future work. 

Theorem 5.8. Suppose that the step size h satisfies h < N^"' . Then 

\\{Ptrap,k-V,)\\^Zo = 0{N'^h'). 

Proof. Consider one step of the method with a step-size of size h and with initial value xq- Note 
that the first step of the algorithm produces a value y* that is distributionally equivalent to one 
produced by a Markov process with generator given by 



</(x) = iV^A(xo)-V^V(x). 

Next, given both xq and y* , step 2 produces a value which is distributionally equivalent to one 
produced by a Markov process with generator 



N . 



B^fix) = N^[^^X{y*) - 6A(xo)]+ • V^' f{x). 



N . 



(5.13) 



Recall that for the exact process, 



Vhfixo) = f{xo) + hA^fixo) + y (^^)V(xo) + 0(iV3^||/||3^ h^). 



For the approximate process we have, 
Ptrap,hfixo) = E,,[E,,[f{Zl'(h))\y*]] 



E,,/(y*) + (1 - e)hE,,[B^ fiy*)] + ^ E,,[{B^ff{y*)] + 0{N^^\\f\\T h')- 

(5.14) 



We will expand each piece of (5.14) in turn. Noting that B^ /{xq) = A f{xo), the first term is 



B^f{Zs)ds 



E.o/(2/*) = /(^o)+IE.o 

= /(xo) + 0M^/(xo) + 



N . 



{Bfff{xo) + 0{N^^f\\rh' 
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We turn attention to the second term, (1 — 6)hMxo[B2 f{y*)]i and begin by making the following 
definition: 

g{y*) B^fiy*) = N^[^,Xiy*) - 6A(xo)]+ • V^/(y*), 
so that g{x) = A^i'([^iA(x) - ^^(xo)]^ • V^)/(x). Because 6 - 6 = 1, we have 

g(xo) = N"'X{xo) ■ V^/(xo) = ^^/(xq). 



By our standing assumption (5.12) 

9{xo + Ck) - 9{xo) = iV^(eiA(xo + Ck) - 6A(xo)) • V^/(xo + Ck) - NU{xo) ■ V^/(xo). 
After some algebra 

B^gixo) = N^Xixo) ■ V^5)(^o) = N^Y1 N'" Xk{xoMxo + Ck) - gixo)] 

k 

= CiNUixo) ■ V^(iV^A • /)(xo) - e2iV^A(xo) • V^(A(xo) • /)(xo) 
= ^^/(xo)) - UiB^ff)ixo). 

Thus, 

E.,[B^fiyl]=^Myl]=9ixo) + 0hB^g{xo) + OiN^^\\f\\rh^) 

= A'^fixo) + 9h [UB^A^f){xo) - UB^ffixo)] + 0{N'^\\f\\r h^) 
= A^'fixo) + eh iUA'^ffixo) - UB^ffixo)] + 0{N^''\\f\\rh^), 

where the last line follows since B^ f{xo) = /{xq) for any /. 
Finally, we turn the the last term in (5.14). Define 

qiy*) {B^?f{y*) 

= [iiX{y*) - 6A(xo)]+ • V^([6A(y*) - 6A(xo)]+V^/)(y*), 

so that 

q{x) = [6A - 6A(xo)]+ • V^([6A - 6A(xo)]+V^/)(x). 



By our standing assumption (5.12) we have 

^.,[{B^?f{y*)]=^My*)] 

= q{x^) + 0{N^'<\\f\\Th) (5.15) 
= {B^ff{xo) + 0{N^m\\Th). 

Noting that 

(1-0)^6 = 2 and {i-e)ei2= ^'^~^l^^\ 
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we may conclude the following from the above calculations 



+ 0{N^^\\f\\rh') 
= /(xo) + 9hA''f{xo) + )V(xo) 

+ (1 - e)M^/(xo) + ^M^)V(xo) - y [(1 - Of + e'KB^ffixo) 
+ ^^^|^«)V(xo)+0(iV37||/|ir/^^) 
= /(xo) + ^^/(xo) + ^(-4^)V(xo) + OiN'^WfWrh'). 



Thus 

and the proof is complete. 



(PtrapA -V^)f Wo GO{N^^\\f\\rh'' 



□ 



6 Global Bounds and Stability 



In Section 6.1 we bound , which was the remaining piece to handle in Theorem 4.1 to give 

us global bounds on the weak error induced by the different methods. In Section 6.2, we briefly 
discuss some issues related to stability of the different methods. 



6.1 Bounds on \\Vtf\\^'' 

In this section we bound where n is a nonnegative integer and Vt is the semigroup 



operator (|4.1l) of the scaled process (|1.4|). We point out, however, that for any process for 



which Vt is well behaved, in that ||Pt||^_^o bounded uniformly in A^, the following results are not 
needed, and, in fact, would most likely be a least optimal bound, as the bound grows exponentially 
in N^t. Note that any system satisfying the classical scaling has 7 = 0. We also point out that 
the arguments used below are quite similar to those used in [22] by Hu, Li, and Min, which were 
extensions of those used in [3] by Anderson, Ganguly, and Kurtz. 
For t >0 and any x £ M"^, We define 



Theorem 6.1. // 



< oo, then 

\\v{t,-)\\r = \\m\r< 



n ^ 



where 



Cn 



lAI 



n R + R{n- 1)||A 



In 



(6.1) 



We delay the proof of Theorem 6.1 until the following Lemma is shown, the proof of which is 
similar to that found in [22], which itself was an extension of the proof of Lemma 4.3 in [3]. 
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Lemma 6.2. Given a multiset I o/{l, • • • , R}, there exists a function qi{x) that is a linear function 
of terms of the form Vj^ v{t, x) with \ J\ < \I\, so that 

\l\ 

dtVfvit, x) = N\\ ■ V^)Vf x) + ^(ft • V^)V,\,^7;(f, x + C4) + N^qi{x), 



i=l 



where /3i = V^A. Further, qj consists of at most R{\I\ — 1) terms of the form \/jv{t,x), each of 
whose coefficients are bounded above by ||A||j^^. 

Proof. This goes by induction. For |/| =0, the statement follows because 

dtv{t,x) = N''{X-V^)v{t,x). (6.2) 

Note that in this case, there are no /3j or q terms. It is instructive to perform the |/| = 1 case. We 
have 

dtV^v{t,x) = V^dtv{t,x) 

= V^iNn-V^v{t,x)) 

= N^iV^X ■ V^)v{t, x) + iV^A • V^V^v{t, x) + N^{N-'^V^X ■ V^V^v{t, x)). 
Note that for any 5 : M"^ ^ M 

(Vf A • V^)g{x) + {N-'^'^V^X ■ V^)V^5(x) = (Vf A • V'')g{x + Ck). (6.3) 
Therefore, with g{x) = v{t,x) in the above, we have 

dtV^v{t, x) = N^{X{x) ■ V^)V^v{t, x) + iVT(Vf A(x) • V^)v{t, x + Cfc). 
Now assume that it holds for a set of size < |/|. Then, using the inductive hypothesis. Lemma 



5.2, and equation (6.3) yields 
dtV^Vfvit,x) 



V^dtVfv{t,x) 



N 



ATT 



1^1 



(A • V^)Vf t;(t, x) + Y^iPi ■ V^)V,y^z;(t, x + J + qi{, 



i=l 



(A • V^)Vfufc^^(t, x) + (Vf A • V^)Vf ^;(t, x + a) 
W r 



1=1 L 
+ N^V^qi{x) 



N^{X.V'^)Vl,vit,x)+N'' 



\i\ 



(Vf A • V^) Vfu,\fc^^(t, x + Ck)+ • V^)Vfufcy,^^(t, X + 



1=1 



Vl^g/(x) + (VlV, • V^^)V}\,^^(t, x + Q,+ Ck) 



showing the result. 



□ 
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Proof, {of Theorem 6.1 
Let n > 0. Define 



Unit) = max \Vfv{t,x)\ = \\v\\r ■ 

x,\I\<n 

Each v{t, x) is a continuously differentiable function witli respect to t. Therefore, the maximum 
above is achieved at some {I*,x*) for all t S [0,ti] where ti > 0. Fixing this choice of {I*,x*), we 
have 

Un{t) = V%v{t,X*) 

for all t < ti. 
Note that 

[(A • V^)Vf.^(i, x*)MMt, X*) = Afc(x)(Vf Vf*z;(t, x*))Vf.z;(t, x*) 

k 

= J2 N''Xk{x){VfMt, X* + Ck) - ^?Mt, x*)MMt, x*) (6-4) 

k 

<o, 

where the final inequality holds by the specific choice of /* and x* . Also note that for any ii £ I* 
and any choice of x 



R 



\V''vf,^,^v{t,x)\<^\V,Vf,\,^v{t,x)\<R\VfMt,x*)\. 



(6.5) 



k=l 



From Lemma 6.2 and equations (6.4) and (6.5), we have 



1 



^dt{vfMt,x*)r = {dtV'^Mt,x*))vfMt,xn 



(A • V^'MMt, X*) + ^(A • V^)V,.\,^^;(t, X* + + qi* (x*) 



1=1 



Vf*vit,x*) (6.6) 



r |r| R \V%v{t,x*)\' + R{\r\ - l)||A||^:||Vf.'t;(t,x 



where we have used the fact that each /3j = V^^A for £i £ I*. Setting 



Cn = 2 (||A||7 n R + R{n-1 



n I ! 



(6.7) 



we see by an application of Gronwall's inequality that the conclusion of the theorem holds for all 
t < ti. That is, for t < ti 

Unit) < ll/lire^^^"*. 

To continue, repeat the above argument on the interval [^1,^2), with /*, x* again chosen to maximize 
Un on that interval, and note that 

C/n(ii) < ||/|ire^^^"\ 
so that we may conclude that for ti <t<t2, 

Unit) < ||/||re^'''"*^e^'''"^*"*^^ =- 



Continuing on, we see that t j — )• 00 as i — )• 00 by the boundedness of the time derivatives of vit, x), 
thereby concluding the proof. □ 
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Remark 6.3. In the theorem above, Cn G 

Combining aU of the previous results, we have the following theorems. 

Theorem 6.4. (Global bound for the Euler method) 

Suppose that the step size h satisfies h < N~'^ , and T = nh. Then 

\\{PE.H-VnH)\\i:o = 0{N^^he^^^''^) 



where C2 S 0(||A||2 ) is defined in (6.1). 

Theorem 6.5. (Global bound for the midpoint method) 

Suppose that the step size h satisfies h < N~'^, and T = nh. Then 

where C3 € OdlAUg'^) is defined in ( |6.1[ ). 

The following immediate corollary to the theorem above recovers the result in [3]. 

Corollary 6.6. Under the additional condition h > A^^T^™"^!™*:} in Theorem (6.5), the leading 
order of the error of the midpoint method is 0{h'^). 

Theorem 6.7. (Global bound for the weak trapezoidal method) 

Suppose that the step size h satisfies h < A^^'^, and T = nh. Then 

Ka,,H-VnH\\i:, = 0{h^N'^e^'^^^) 



where C3 E 0(||A||3 ) is defined in (6.1). 

Thus, we see that the weak trapezoidal method detailed in Algorithm [3] is the only method that 
boasts a global error of second order in the stepsize h in an "honest sense." That is, it is a second 
order method regardless of the relation of h with respect to A^. This is in contrast to the midpoint 
method which has second order accuracy only when the order of h is larger than JV-7-mm{mfc}_ 



6.2 Stability Concerns 

The main results and proofs of our paper have incorporated stability concerns into the analysis. 
This is seen in the statements of the theorems by the running condition that h < N~"' , where we 
recall that N'^ should be interpreted as the time-scale of the system. Without this condition, the 
methods are unstable. It is an interesting question, and the subject of future work, to determine 
the stability properties of other methods in this setting. 
As an instructive example, again consider the system 

100 
5*1 <^ iS'2 
100 

with Xi{0) = X2{0) = 10,000. In this case, it is natural to take N = 10,000. As the rate constants 
are 100 = ^10,000, we take /3i = (32 = 1/2 and find that 7 = 1/2. The equation governing the 
normalized process Xf IS 

Xf{t)=X^{0)-Yi(^N^/^Nj\(^{s)ds^^+Y2(^N^/^Nj\2-X^{s))ds 

where we have used that X^ + X^ = 2. It is now clear that if the condition h < N "'is violated 
a path generated by any of the explicit methods discussed in this paper will behave quite poorly. 
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Figure 1: The log-log plot of |E[X|(2)] — E[Z|(2)]| against h for the three approximation methods. 
The slope for Euler's method is 1.21, whereas the slope for the weak trapezoidal solution with 
^? = 1/2 is 3.06, which is better than expected. The curve governing the solution from the midpoint 



method appears to not be linear; a behavior predicted by Theorem 5.6 



7 Examples 

We provide two test systems. The first is a simple linear system with three species that we will use 
to demonstrate our main analytical results. The second is a gene-protien-mRNA model we will use 
to demonstrate the capabilities of the different methods on an actual test problem. We note that 
in all simulations of the weak trapezoidal algorithm, we chose 9 = 1/2. 

Example 1. Consider the following first order reaction network 

with Ki = 0.03, K2 = 1, /^s = 0.1, and K4 = 1. Starting from the initial state 

X{0) = (Xa(0),Xb(0),Xc(0)) = (13000,100,20), 

where we make the obvious associations Xi = X^, X2 = Xb, and X3 = Xc- We approximate ^(2) 
using the three methods considered in this paper: Euler, midpoint, and weak trapezoidal with a 
choice of ^ = 1/2. For first order systems, we may find the first moments and the covariances of 
X{t) as solutions of linear ODEs using a Moment Generating function approach [J5j . 

In Figure[l| we show a log-log plot of |E[X|(2)] — E[Z|(2)]| against h for the three approximation 
methods. Each data point was found from either 10^,2.9 x 10^,3.9 x 10^,4.9 x 10^,8 x 10^, or 
10^ independent simulations, with the number of simulations depending upon the size of h and 
the method being used. The slope for Euler's method is 1.21, whereas the slope for the weak 
trapezoidal solution is 3.06, which is better than expected. The curve governing the solution from 



the midpoint method appears to not be linear; a behavior predicted by Theorem 5.6 

In Figure [2] we again consider the log-log plots of |E[X3(2)] — E[Z3(2)]| against h, but now only 
for Euler's method and the midpoint method so that we may see the change in behavior in the 
midpoint method predicted in Theorem |5.6[ In (a), we see that for larger h the slope generated via 
the midpoint method is 2.03, whereas in (b) the slope is 1.12 when h is smaller. For reference, in 
(a) the slope generated by Euler's method is 1.366, whereas in (b) it is 1.09. 
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While the simulations make no use of the scalings inherent in the system, it is instructive 
for us to quantify them in this example so that we are able to understand the behavior of the 
midpoint method. We have N ^ 10^, ai = 1,02 = 1/2, 03 = 1/4, and = 1/4. Also, 7 0. 
Therefore, Theorem |5.6| predicts the midpoint method will behave as an order two method if 
h > N-""/^ PS 1/10, or if log(/i) > -2.3, which roughly agrees with what is shown in Figures [I] and 
[2] Note that Theorem |5.6| will never provide a sharp estimate as to when the behavior will change 
as it is a local result and the scalings in the system will change during the course of a simulation. 

The fact that the trapezoidal method gave an order three convergence rate above does not hold 
in general. This was demonstrated in the proof of Theorem \5.8\ but it is helpful to also show this 
via example. In Figure [3]we present a log-log plot of |EX2(2) — KZ2{2)\ for the different algorithms 
on this same example. The approximate slopes are: 1.02 for Euler's methods, 2.372 for midpoint 
method, and 2.3 for the trapezoidal method. We point out that all of the plots above represent 
results pertaining to the non-normalized processes as the simulation methods themselves make no 
use of the scalings. 

Example 2. Consider a model of gene transcription and translation: 

G^G + M, M^^°M + P, 2P°-^^D, M ^ 0, P 4 0. 

Here a single gene is being translated into mRNA, which is then being transcribed into proteins, 
and finally the proteins produce stable dimers. The final two reactions represent degradation of 
mRNA and proteins, respectively. Suppose we start with one gene and no other molecules, and 
want to estimate the expected number of dimers at time T = 1 to an accuracy of it 1.0 with 95% 
confidence. Therefore, we want the variance of our estimator to be smaller than (1/1.96)^ = .2603. 

While e = 1 for the unsealed version of this problem, the simulation of just a few paths of the 
system will show that there will be somewhere in the magnitude of 3,500 dimers at time T = 1. 
Therefore, for the scaled system, we are asking for an accuracy of ?= 1/3500 ~ 0.0002857. Also, 
a few paths (100 is sufficient) shows that the order of magnitude of the variance of the normalized 
number of dimers is approximately 0.11. Thus, the approximate number of exact sample paths we 
will need to generate can be found by solving 

—Var (normalized # dimers) = (?/1.96)^ =^ n = 5.18 x 10^. 
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Log h 



Figure 3: Log-log plot of |E[X3(2)] — E[Z3(2)]| against h for the three approximation methods. 
The approximate slopes are: 1.02 for Euler's methods, 2.372 for midpoint method, and 2.3 for the 
trapezoidal method. 



Approximation 


# paths 


CPU Time 


Variance of estimator 


# updates 


3714.2 ± 1.0 


4,740,000 


149,000 CPU S 


0.25995 


8.27 xlO^" 



Table 1: Performance of Exact algorithm with crude Monte Carlo estimator (1.2). 



Therefore, we will need approximately five million independent sample paths generated via an exact 
algorithm. Implementing the modified next reaction method [T] on our machine (using Matlab), 
each path takes approximately 0.03 CPU seconds to generate. Therefore, the approximate amount 
of time to solve this particular problem will be 155,000 CPU S, which is about forty three hours. 
The outcome of such a simulation is detailed in Table [T] where updates" refers to the total 
number, over all paths, of updates to the system performed, and random variables generated, 
and is used as a quantification for the computational complexity of the different methods under 
consideration. In terms of software and hardware, the authors used Matlab for all computations, 
which were performed on an Apple machine with a 2.2 GHz Intel 17 processor. 

Next, we solved the problem using Euler's method, the approximate midpoint method, and 
the weak trapezoidal method with 9 = 1/2. We note that for each of the three approximations, 
we used the most naive implementation possible by simply setting the value of any component 
that goes negative in the course of a step to zero, and by using a fixed step size, h > 0. Thus, 
improvements can be gained on the stated results by using a more sophisticated implementation 
[21 [H]- However, we did produce our approximate paths in batches of 50,000, which greatly reduces 
the cost of generating the Poisson random variables with the built in Matlab Poisson random 
number generator. 

In Table [2] we provide data on the performance of Euler's method with various step-sizes, 
combined with a crude Monte Carlo estimator (1.8). Note that the bias in Euler's method is 
apparent even for very small h. In Table [3] we provide data on the performance of the midpoint 
method with various step-sizes, combined with a crude Monte Carlo estimator (1.8). Note that 
the solution has a much higher variance when h = 1/3, thereby necessitating significantly more 
paths to get a desired tolerance. This demonstrates the stability concerns discussed in Section [6. 2[ 
This problem does not arise as much when using the weak trapezoidal method. In Table |4] we 
provide data on the performance of the weak trapezoidal method with various step-sizes, combined 
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Step-size 


Approximation 


# paths 


CPU Time 


Variance of estimator 


# updates 


h = 3-' 


3,712.3 ± 1.0 


4,750,000 


13,374.6 CPU S 


0.25898 


6.2 X 10^" 


h = 3-^ 


3,707.5 ± 1.0 


4,750,000 


6,207.9 CPU S 


0.25839 


2.1 X 10^" 


h = 3-^ 


3,693.4 ± 1.0 


4,700,000 


2,803.9 CPU S 


0.26018 


6.9 X 10^ 


h = 3-^ 


3,654.6 ± 1.0 


4,650,000 


1,219 CPU S 


0.25940 


2.6 X 10^ 



Table 2: Performance of Euler's method with crude Monte Carlo. 



Step-size 


Approximation 


# paths 


CPU Time 


Variance of estimator 


# updates 


h = 3-^ 


3,713.6 ± 1.0 


4,650,000 


1,269.1 CPU S 


0.25996 


2.3 X 10^ 


h = 3-^ 


3,713.9 ± 1.0 


4,500,000 


497.5 CPU S 


0.25860 


7.6 X 10^ 


h = 3-'' 


3,722.4 ± 1.0 


4,050,000 


177.6 CPU S 


0.25972 


2.2 X 10*^ 


h = 3-' 


3,986.1 ± 1.0 


18,500,000 


376.0 CPU S 


0.26020 


3.3 X 10*^ 



Table 3: Performance of midpoint method with crude Monte Carlo. 



with a crude Monte Carlo estimator (1.8). We see that for this example the midpoint method 
and the weak trapezoidal method are, overall, comparable. However, the weak trapezoidal method 
performs, in terms of bias and required CPU time, significantly better than does the midpoint 
method for h = 1/3. 

It is worth noting that both the midpoint and weak trapezoidal methods compare decently on 
this example with the multi-level Monte Carlo method developed recently for stochastic chemical 
kinetic systems |1| . The choice of which method (an explicit solver discussed herein or a multi- level 
Monte Carlo solver) a user wishes to implement will therefore often be problem, and user, specific. 

We next used each of the methods above to estimate the probability that the number of dimers 
at time 1 is greater than or equal to 6,000. Note that this probability is the expected value 
of the indicator function l{XDimcr(i)>6.ooo}- The results are presented in Table [5| which provides 
95% confidence intervals for a few choices of h for each method. Note that in computing this 
approximation the weak trapezoidal method has significantly less bias than does the midpoint 
method for comparable step-sizes, making it the method of choice for this particular choice of 
function /. The necessary CPU time for each of the methods is the same as those reported above. 
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Step-size 


Approximation 


# paths 


CPU Time 


Variance of estimator 


# updates 


h = 3-^ 


3,714.4 ± 1.0 


4,750,000 


2,120.5 CPU S 


0.25940 


4.6 X lO'^ 


h = 3-^ 


3,714.6 ± 1.0 


4,750,000 


898.2 CPU S 


0.25940 


1.6 X lO'^ 


h = 3-' 
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4,800,000 


349.8 CPU S 


0.25965 


5.2 X 10*^ 


h = 3-' 
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238.2 CPU S 


0.25944 


3.2 X 10^ 



Table 4: Performance of weak trapezoidal method with 9 = 1/2, with crude Monte Carlo. 
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Method 


Step-size 


# paths 


Approximation 


Exact 


N.A. 


4,520,000 


0.02843 ± 0.00015 


Euler 


h = 3-' 


4,750,000 


0.02818 ± 0.00015 


Euler 


h = 3-*^ 


4,750,000 


0.02782 ± 0.00015 


Midpoint 


h = 3-^ 


4,650,000 


0.02718 ± 0.00015 


Midpoint 
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Weak Trap, 6* = 1/2 
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4,750,000 
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Weak Trap, 6* = 1/2 
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Table 5: Approximation of P{XDimer(l) > 6,000} using different methods and different step sizes. 
As expected, the weak trapezoidal method demonstrates significantly less bias than do the Euler 
and midpoint methods. 
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